Data Extraction for preselected commodities portfolio

Author

Rodrigo Hermont Ozon, Érick Oliveira Rodrigues

Published

Invalid Date

Code
start_time <- Sys.time()

Abstract

This small document have the goal to share the time series extraction and the two basic features building, like price returns and their conditional variance…



Intro

[… to be written …]


Python codes

Code

import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from arch import arch_model

Loading time series data, for portfolio setting…

Code

# Tickers for portfolio
TICKERS = [
    "ZC=F",  # Corn Futures
    "ZO=F",  # Wheat Futures
    "KE=F",  # KC HRW Wheat Futures
    "ZR=F",  # Rough Rice Futures
    "GF=F",  # Feeder Cattle Futures
    "ZS=F",  # SoyMeal Futures
    "ZM=F",  # Soybean Meal Futures
    "ZL=F"   # SoyBeans Futures
]

Obtain daily prices and their returns:

Code
# Downloading data from Yahoo Finance
portfolio_prices = yf.download(TICKERS, start="2019-01-01")['Adj Close']

[                       0%                       ]
[************          25%                       ]  2 of 8 completed
[******************    38%                       ]  3 of 8 completed
[**********************50%                       ]  4 of 8 completed
[**********************62%*****                  ]  5 of 8 completed
[**********************75%***********            ]  6 of 8 completed
[**********************88%*****************      ]  7 of 8 completed
[*********************100%***********************]  8 of 8 completed
Code
portfolio_prices.dropna(inplace=True)

# Renaming columns for better readability
portfolio_prices.columns = [
    "corn_fut",
    "wheat_fut",
    "KCWheat_fut",
    "rice_fut",
    "Feeder_Cattle",
    "soymeal_fut",
    "soyF_fut",
    "soybeans_fut"
]

Plotting the time series prices (in level):

Code
# Plot time series of portfolio prices one by one
for column in portfolio_prices.columns:
    plt.figure(figsize=(10, 6))
    sns.lineplot(x=portfolio_prices.index, y=portfolio_prices[column])
    plt.title(f'Time Series of {column} Prices')
    plt.xlabel('Date')
    plt.ylabel('Price')
    plt.grid(True)
    plt.show()

Obtain the returns time series (first feature):

\[ \mbox{Price log returns}_t = ln(p_t) - ln(p_{t-1}) \]

Code

# Calculate log returns
portfolio_log_returns = np.log(portfolio_prices / portfolio_prices.shift(1)).dropna()
portfolio_log_returns.columns = [
    "ret_corn_fut",
    "ret_wheat_fut",
    "ret_KCWheat_fut",
    "ret_rice_fut",
    "ret_Feeder_Cattle",
    "ret_soymeal_fut",
    "ret_soyF_fut",
    "ret_soybeans_fut"
]

And plot it:

Code
# Plot time series of log returns one by one
for column in portfolio_log_returns.columns:
    plt.figure(figsize=(10, 6))
    sns.lineplot(x=portfolio_log_returns.index, y=portfolio_log_returns[column])
    plt.title(f'Time Series of {column} Log Returns')
    plt.xlabel('Date')
    plt.ylabel('Log Returns')
    plt.grid(True)
    plt.show()

The GARCH(1,1) model with an asymmetric Student-t distribution is not directly available in most Python libraries. However, we can still use a GARCH(1,1) model with a standard Student-t distribution to estimate the conditional variance. The GARCH(1,1) model is represented as follows:

\[ r_t = \mu + \epsilon_t \]

\[ \epsilon_t = \sigma_t z_t, \quad z_t \sim t_{\nu}(0, 1) \]

\[ \sigma_t^2 = \omega + \alpha \epsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \]

Where:

  • \(r_t\) is the return at time \(t\).
  • \(\mu\) is the mean of the returns.
  • \(\epsilon_t\) is the error term, modeled as conditional on past information.
  • \(\sigma_t^2\) is the conditional variance at time \(t\).
  • \(\omega, \alpha, \beta\) are the parameters to be estimated, with \(\omega > 0, \alpha \geq 0, \beta \geq 0\).
  • \(z_t\) follows a Student-t distribution with \(\nu\) degrees of freedom to capture the heavy tails observed in financial returns.
Code
# Estimate GARCH(1,1) model for each asset and extract conditional variances
conditional_variances = pd.DataFrame(index=portfolio_log_returns.index)

for column in portfolio_log_returns.columns:
    model = arch_model(portfolio_log_returns[column], vol='Garch', p=1, q=1, dist='t')
    model_fitted = model.fit(disp='off')
    conditional_variances[column] = model_fitted.conditional_volatility ** 2

# Plot time series of conditional variances one by one
for column in conditional_variances.columns:
    plt.figure(figsize=(10, 6))
    sns.lineplot(x=conditional_variances.index, y=conditional_variances[column])
    plt.title(f'Conditional Variance of {column} (GARCH(1,1) Estimated)')
    plt.xlabel('Date')
    plt.ylabel('Conditional Variance')
    plt.grid(True)
    plt.show()


R codes

Code
library(tidyverse)
library(dplyr)
library(ggplot2)
#library(plotly)
library(rugarch)
library(timeSeries)
library(fPortfolio)
library(quantmod)
library(caTools)
library(PerformanceAnalytics)
library(MASS)
library(PortfolioAnalytics)
library(ROI)
require(ROI.plugin.glpk)
require(ROI.plugin.quadprog)
library(quadprog)
library(corpcor)
library(DEoptim)
library(cowplot) # devtools::install_github("wilkelab/cowplot/")
library(lattice)
library(timetk)

Loading time series data, for portfolio setting…

Code
tickers <- c(
         "ZC=F", # Corn Futures
         "ZO=F", # Wheat Futures
         "KE=F", # Futuros KC HRW Wheat Futures
         "ZR=F", # Rough Rice Futures
         "GF=F", # Feeder Cattle Futures
         "ZS=F", # SoyMeal Futures 
         "ZM=F", # Futuros farelo soja
         "ZL=F"  # SoyBeans Futures
)

Obtain daily prices and their returns:

Code
portfolioPrices <- NULL
  for ( Ticker in tickers )
    portfolioPrices <- cbind(
      portfolioPrices, 
      getSymbols.yahoo(
        Ticker,
        from = "2019-01-01",
        auto.assign = FALSE
      )[,4]
    )

portfolioPrices <- portfolioPrices[apply(portfolioPrices, 1, function(x) all(!is.na(x))),]

colnames(portfolioPrices) <- c(
  "corn_fut",
  "wheat_fut",
  "KCWheat_fut",
  "rice_fut",
  "Feeder_Cattle",
  "soymeal_fut",
  "soyF_fut",
  "soybeans_fut"
)

tail(portfolioPrices)
           corn_fut wheat_fut KCWheat_fut rice_fut Feeder_Cattle soymeal_fut
2024-09-27   418.00    384.75      576.75   1508.0       247.075     1065.75
2024-09-30   424.75    392.50      583.75   1529.5       246.200     1057.00
2024-10-01   429.00    388.00      598.25   1530.0       246.150     1057.25
2024-10-02   432.50    389.75      619.25   1516.5       249.725     1056.00
2024-10-03   428.25    383.75      611.50   1517.0       248.975     1046.00
2024-10-04   424.75    388.25      598.00   1509.5       249.625     1037.75
           soyF_fut soybeans_fut
2024-09-27    343.7        42.18
2024-09-30    344.3        43.51
2024-10-01    350.0        42.91
2024-10-02    341.4        43.72
2024-10-03    332.8        44.69
2024-10-04    330.5        44.04

Plotting the time series prices (in level):

Code
portfolioPrices |> as.data.frame() |>
  mutate(
    time = seq_along( corn_fut )
  ) |>
  pivot_longer(
    !time,
    names_to = "Variables",
    values_to = "Value"  
      ) |>
  group_by(Variables) |>
  plot_time_series(
    time,
    Value,
    .interactive = F, # Change for TRUE for better visualization
    .facet_ncol = 2,
    .smooth = FALSE
  ) +
  theme(
    strip.background = element_rect(fill = "white", colour = "white")
  )

Obtain the returns time series (first feature):

\[ \mbox{Price log returns}_t = ln(p_t) - ln(p_{t-1}) \]

Code
# Calculate log returns for the portfolio prices
portfolioReturs <- na.omit(diff(log(portfolioPrices))) |> as.data.frame()

colnames(portfolioReturs) <- c(
  "ret_corn_fut",
  "ret_wheat_fut",
  "ret_KCWheat_fut",
  "ret_rice_fut",
  "ret_Feeder_Cattle",
  "ret_soymeal_fut",
  "ret_soyF_fut",
  "ret_soybeans_fut"
)

glimpse(portfolioReturs)
Rows: 1,449
Columns: 8
$ ret_corn_fut      <dbl> 0.0105891128, 0.0085218477, -0.0019601444, -0.005903…
$ ret_wheat_fut     <dbl> 0.0008980692, 0.0053715438, -0.0017873106, 0.0115608…
$ ret_KCWheat_fut   <dbl> 0.0220892515, 0.0049529571, -0.0059464992, 0.0039682…
$ ret_rice_fut      <dbl> 0.0083930387, 0.0053934919, 0.0174507579, 0.00813596…
$ ret_Feeder_Cattle <dbl> -0.0096783375, -0.0111522135, 0.0075628145, 0.011068…
$ ret_soymeal_fut   <dbl> 0.0061281529, 0.0102224954, 0.0030190774, -0.0065988…
$ ret_soyF_fut      <dbl> 0.0054513913, 0.0076457646, 0.0097900861, -0.0018874…
$ ret_soybeans_fut  <dbl> 0.0099858421, 0.0081286732, -0.0052938051, -0.002834…
Code
#portfolioReturs <- as.timeSeries(portfolioReturs)

Plot all time series and their returns:

Code
portfolioReturs |> 
  mutate(
    time = seq_along( ret_corn_fut )
  ) |>
  pivot_longer(
    !time,
    names_to = "Variables",
    values_to = "Value"  
      ) |>
  group_by(Variables) |>
  plot_time_series(
    time,
    Value,
    .interactive = F, # Change for TRUE for better visualization
    .facet_ncol = 2,
    .smooth = FALSE
  ) +
  theme(
    strip.background = element_rect(fill = "white", colour = "white")
  )

Plotting the histograms:

Code
portfolioPrices_df <- as_tibble(portfolioPrices, rownames = "date")
portfolioPrices_df$date <- ymd(portfolioPrices_df$date)

portfolioReturs_df <- na.omit( ROC( portfolioPrices ), type = "discrete" ) |>
  as_tibble(rownames = "date")
portfolioReturs_df$date <- ymd(portfolioReturs_df$date)
colnames(portfolioReturs_df) <- c(
  "date",
  "ret_corn_fut",
  "ret_wheat_fut",
  "ret_KCWheat_fut",
  "ret_rice_fut",
  "ret_Feeder_Cattle",
  "ret_soymeal_fut",
  "ret_soyF_fut",
  "ret_soybeans_fut"
)

# Remover a coluna com nome NA
portfolioReturs_df <- portfolioReturs_df[, !is.na(colnames(portfolioReturs_df))]

# Verificar novamente os nomes das colunas para garantir que estão corretos
colnames(portfolioReturs_df)
[1] "date"              "ret_corn_fut"      "ret_wheat_fut"    
[4] "ret_KCWheat_fut"   "ret_rice_fut"      "ret_Feeder_Cattle"
[7] "ret_soymeal_fut"   "ret_soyF_fut"      "ret_soybeans_fut" 
Code
portfolioReturs_long <- portfolioReturs_df |> 
  pivot_longer(
    cols = -date, # Exclui a coluna de data
    names_to = "fut_type", 
    values_to = "returns"
  )

ggplot(portfolioReturs_long, aes(x = returns)) + 
  geom_histogram(aes(y = ..density..), binwidth = .01, color = "black", fill = "white") +
  geom_density(alpha = .2, fill="lightgray") +
  theme_minimal() +
  theme(
    axis.line  = element_line(colour = "black"),
    axis.text  = element_text(colour = "black"),  
    axis.ticks = element_line(colour = "black"), 
    legend.position = c(.1,.9), 
    panel.grid.major = element_blank(), 
    panel.grid.minor = element_blank()
  ) +
  theme(plot.title   = element_text(size = 10),  
        axis.title.x = element_text(size = 7), 
        axis.title.y = element_text(size = 7)) + 
  labs(x = "Returns", y = "Density") +
  facet_wrap(~fut_type, scales = "free", ncol = 2) 

And finnaly, the last feature, is called, the conditional variance (risk measure), obtained by GARCH(1,1) model, formalized as:

The GARCH(1,1) model with asymmetric Student-t distribution can be represented mathematically as:

\[ r_t = \mu + \epsilon_t \]

\[ \epsilon_t = \sigma_t z_t, \quad z_t \sim t_{\nu}(0, 1) \]

\[ \sigma_t^2 = \omega + \alpha \epsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \]

Where:

  • \(r_t\) is the return at time \(t\).
  • \(\mu\) is the mean of the returns.
  • \(\epsilon_t\) is the error term, modeled as conditional on past information.
  • \(\sigma_t^2\) is the conditional variance at time \(t\).
  • \(\omega, \alpha, \beta\) are the parameters to be estimated, with \(\omega > 0, \alpha \geq 0, \beta \geq 0\).
  • \(z_t\) follows an asymmetric Student-t distribution with \(\nu\) degrees of freedom to better capture the heavy tails and skewness observed in financial returns.
Code
# Load necessary packages
library(rugarch)

# Define the GARCH(1,1) model specification with Student-t distribution
spec <- ugarchspec(
  variance.model = list(model = "sGARCH", garchOrder = c(1, 1)),
  mean.model = list(armaOrder = c(0, 0), include.mean = TRUE),
  distribution.model = "std" # Using Student-t distribution
)

# Estimate the model for each asset in the portfolio and extract conditional variances
garch_models <- list()
conditional_variances <- list()

for (i in colnames(portfolioReturs)) {
  garch_models[[i]] <- ugarchfit(spec, data = portfolioReturs[[i]])
  conditional_variances[[i]] <- sigma(garch_models[[i]])^2
}

# Convert conditional variances list to a data frame
conditional_variances_df <- do.call(cbind, conditional_variances) %>%
  as.data.frame() %>%
  mutate(time = seq_along(conditional_variances[[1]]))

colnames(conditional_variances_df) <- c(
  "cond_var_corn_fut",
  "cond_var_wheat_fut",
  "cond_var_KCWheat_fut",
  "cond_var_rice_fut",
  "cond_var_Feeder_Cattle",
  "cond_var_soymeal_fut",
  "cond_var_soyF_fut",
  "cond_var_soybeans_fut",
  "time"
)

# Reshape data for plotting
conditional_variances_long <- conditional_variances_df %>%
  pivot_longer(!time, names_to = "Variables", values_to = "Value")

And the plot of the conditional variance (risk):

Code
conditional_variances_long |> 
  group_by(Variables) |>
  plot_time_series(
    time,
    Value,
    .interactive = F, # Change for TRUE for better visualization
    .facet_ncol = 2,
    .smooth = FALSE
  ) +
  theme(
    strip.background = element_rect(fill = "white", colour = "white")
  )

 

 


References

Gujarati, D., N. (2004) Basic Econometrics, fourth edition, The McGraw−Hill Companies

Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate Data Analysis. Pearson.

Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on oct 2023.

R packages

Code
citation(package = "tidyverse")
Para citar o pacote 'tidyverse' em publicações use:

  Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R,
  Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller
  E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V,
  Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to
  the tidyverse." _Journal of Open Source Software_, *4*(43), 1686.
  doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.

Uma entrada BibTeX para usuários(as) de LaTeX é

  @Article{,
    title = {Welcome to the {tidyverse}},
    author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
    year = {2019},
    journal = {Journal of Open Source Software},
    volume = {4},
    number = {43},
    pages = {1686},
    doi = {10.21105/joss.01686},
  }

Code
# Total timing to compile this Quarto document

Sys.time() - start_time
Time difference of 42.78175 secs